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IN5IM1 is a computer program, written in LISP, which 
models simple forms of learning analogous to the learning of 
a human infant during the first few weeks of his life, such 
as learning to suck the thumb and learning to perform elementary 
hand-eye coordination* 

The program operates by discovering cause-effect relation¬ 
ships and arranging them in a goal tree. For example, if A 
causes B, and the program wants B, it will set up A as a subgoal, 
working backward along the chain of causation until it reaches a 
subgoal which can be reached directly? i*e* a muscle pull* 

Various stages of the simulated infant's learning are 
described. 


IMSlMl: 

A Computer Model o£ Simple Forms of Learning 


I. Introduction and Summary 
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INSI.M1 is a computer program, written in LISP, which models 
simple forms of learning analogous to the learning of a human 
infant during the first few weeks of his life, such as learn¬ 
ing to suck the thumb and learning to perform elementary 
liand-eye coordination. 

The program operates by discovering cause-effect relation¬ 
ships and arranging them in a goal tree. For example, if a 
causes B, and the program wants B, it will set up A as a sub- 
goal, working backward along the chain of causation until it 
reaches a subgoal which can be reached directly, i*e* a muscle 
pull. 

A typical problem is the one-dlmenaiona1, three-point thumb- 
sucking problem, which can be described in logical notation as 
follWS! 

{1} object touching mout h —+ p 1 ^ sure 

{2) {left check touch A turn head 1 eftuth touch 

{U) {right cheek touch y| turn head right J—^mouth touch 

(4) {left cheek touch right cheek touchmouth toueh 

(5) lift hand —^face touch 
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After the program, has learned these connections, it will 
emit the behavior sequence "lift hand, torn head (left or 
right), 1 * resulting in pleasure- 

Below is a block diagram of INEIM1; 
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The pgrfcungncg pt~Qqja,in has the direct responsibility for 
synthesising behavior - It is written in an interpretive lan¬ 
guage called PSIK (parallel simulator)- The performance pro¬ 
gram receives stimuli from and, sends responses to a body and, 
environment simulator ; the display section provides real-time 
monitoring on the cathode-ray tube. The motivation section 
activates the main goal (oral gratification or curiosity), 

Relatively little of the performance program is innate. 
Most of it is generated by an experience-driven compiler, which 
is the core of the learning part of the program. 


Causality is detected by statistical correlation; if a 
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signal occurs on line A followed by one on line 0, and if this 
sequence is repeated sufficiently many tiir.es r the program 
assumes that A causes Bf The program is equipped for the simp- 
lest type of pattern recognition and concept formations the 
formation of logical AND 1 & and OR's of previously known vari¬ 
ables, The program has an intellectual motivation system which 
causes it to exhibit simple forms o£ curiosity, play, and 
exploratory behavior* 

11. The Performance Program 

As described above, the performance program has the direct 
responsibility for receiving cues from the environment and 
emitting properly timed and sequenced behavior, it ie coded in 
PSIM, a language which will be described in detail below. The 
performance program operates by activating various branches of 
the goal tree at the appropriate times* in the thumb-sucking 
problem, assume that the motivation section has activated the 
main goal "oral gratification." The first stgj is tq activate 
the extreme left branch of the tree (the dotted line indicates 
activation)? 
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The "lifthand" response at the bottom of this branch is 
omitted to the body and environment simulator* After a delay 
of roughly two simulated"*time seconds a cue# e+g* “left choc): 
touch, pl comes backj indicating that the simulated hand has boon 
lifted to touch the (simulated) left check. Sext* the branch 
ending in h turn head left" is activated; 
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A “mouth touch" signal comes hack from the body and environ- 
merit simulator r indicating that this goal has been reached; the 
motivation section activates the oral gratification flag, "reward¬ 
ing' 1 the program for its successful effort. 

The basic problem is tc decide which branch of the goal tree 
to activate* (INSIM1 performance programs allow only one branch 
to he active at a time; hence there is no way to work on two 
goals simultaneously*) in a given situation, the decision is 
made in two phages, a feasibility gtudy phase and a choice phase. 

In the feasibility study phase, each path to the main goal 
is assessed, and an estimate is made of which path is the Quick¬ 
est and surest way to the main goal* Two numerical quantitiea 
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are computed for each gubgoal, GPR (global success probability) 
and a GC (global cost)* The GPR of a subgoal is an estimate o£ 
the conditional probability that, if the program attempts to 
achieve the subgoal, it will succeed In reaching it* 

A* Computation of GPR and GC 

This section ie devoted to a detailed discussion of how GPR 
and GC are computed* Gn a first reading, readers may skip to 
Section c on the choice phase* 

GPR is defined recursively as follows t 

(1) for a '^response" (directly controllable) variable, 
such as "lift hand 1 ' or "turn head left", GPR^l* 

(2) Suppose that ft is one of several OR'ed subgoals of B; 

further, suppose that A is the "best" aubgoal of 5 in the sense 
that it maximizes the Slagle coefficient { ) ? 

SFKtBfr 

GC(B) 

Then GPR(E) = GPR(A) Pr (E/ft), where Fr (b/a) is the conditional 
probability of E given A (i*e* the probability of getting from 
A to b) as estimated by the coefficient learning program PRDULIS, 

discussed below* 

■ ■ 

(3) Suppose that Al and A2 are components of the ordered- 

AND goal AlTHl (Al then A2) . Then GPR (AlTH2) * GPR (Al) * GPR (A2) , 
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(4) Notwithstanding any Df the above, if a goal has 
already been achieved, its GPE^l, A. goal is defined as lr al- 
ready achieved" if the corre^i onding signal has occurred within 
the last five seconds- 

similarly, the GC (time delay) of a subgoal is defined 
recursive1 y as fol1ows; 

(1) For a response variable, GC * 0* 

(2} if A is the best of several OR'ed subgoals of B, 
then <3C(B) = GC(A) + GEE (A) Delay (A—>-f3j - 

(3) The GC of an ordered-AND goal AlTHAS (A1 then Al) is 
GC (A1TI&2) - GC{Al) + GFR (Al> *GC (A2) * 

(4) Notwithstanding any of the above, the GC of a goal is 
0 if the goal is already achieved (in the past five seconds). 

To summarize, in the feasibility study phase, estimates 
are made of the success probability and tiniE delay of each path 
to the main goal* 

E - The Choice Phase 

■ ■ ^ ■ i f ■—m— r !■■■■■ i 

The next step is to activate the goal tree branch which is 
estimated, according to simple heuristics, to be the quickest 
and surest path to the main goal. A goal is actiye if, and 
only if, its WANT variable has the value TRUE* 
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C. _ Cganputation p£ the WANT Variables 

(On a first reading, readers may skip to section fi cm the 
inner loop.) The want variable o£ a goal G is defined recur¬ 
sively as followe: 

(1) i£ G is a main goal, WANT(G) - TRUE or FALSE as set 
by the motivation system. 

(2) If A is one of several OR'ed subgoals o£ 3, 

WANT (A) = (WANT B) A (A Is not already achieved) A (A is the 
best subgoal of B) )V(A is a curiosity goal (see below )), 

where "already achieved" means that the signal h has occurred 

in the last five seconds, and the "best" subgoal is that which 

maximises 

GFK(B) 

gc(b) 

(3) If A1TBA2 is the ordered-AND suhgoal "A1 then A2% 

want (Al) ■ want (A1TBA2) f\ is not already achieved) 

WANT (A2) * WANT (AlTE^jAtAl is achieved) f\ (A2 ie not achieved). 

(4) If G is a response (directly controllable) variable, 

WANT (G) causes the response to be emitted. 

d. The Inner Loop 

The feasibility study and choice phases are performed 
every time the Simula ted-time clock, TCLOCK r changes. 
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Thus the GPR, QC, and the program's decisions are con¬ 
stantly being updated on the basis of changing conditions* 

The PSiM. Interpreter ensures reasonable efficiency by recomput¬ 
ing only the variables which depend on some condition which has 
changed since the last TCLOCK time. 

E. Discussion 

I ■■■ ■ i. 

IKSIM1 performance programs incorporate simple heuristics 
which work well in cases where the assumptions on which they axe 

based hold true. 

Among the aasumptions are* 

{11 Success probabilities and time deiayg are assumed to 
be statistically independent, if this is not true, the chaining 
formulae used in computing success probabilities and time 
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delays will not toe accurate* 

(1) predictions pf success probability and time delay are 

based on the value of variables which may change with time, 
Suppose the program says that Pr{fl/M is high* and the A^jJB 
branch of the tree is selected* Then suppose that one of the 
variables used in computing Fr(E/A) changes before A is reached. 
If the new value of Pr(B^) is low, this may indicate that £ 
cannot he reached. 

(3) It is assumed that goals do not conflict: i*e, that 
the achievement of one goal does not decrease the probability 
of achieving another goal* 

Removing these performance limitation□ would require addi¬ 
tional machinery beyond the scope of the INSIK1 project, such 
a a a look-ahead method of the type used in chess programs- 

F * The Experience-Directed compiler 

As mentioned previously, most of the performance program is 
coded by an internal compiler which, instead of using as its 
input a source code prepared by a human* is controlled toy the 
experience aeguired toy the program as it interacts with its 
{simulated) environment* in keeping with the dictum that in 
ord!ar to learn something, one must know something already* the 
compiler incorporates the probability formulas described above* 
plus knowledge of basic aspects of the physical world, 
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including time and causality. 


The compiler consists of pattern recognizers , code gener ¬ 
ators , and a plausible move generator (not implemented at this 
writing}» 
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The plausible move generator is used instead of testing 

for causality between all possible variables A, B* The latter 

2 

approach would involve on the order of n tests, where n is the 
number of variables. 

It is thecompiler which sets the upper limit on the program's 

ability to learn, for example, IESIM1 could never learn to play 
chess even with very long training, because the necessary pat¬ 
tern recognizers and code generators are not present, 
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The escper i elite—dr Lven compiler Opera tea as follows: The 
program, starts out with an innate main goal which is "oral 
gratification" in the thumb-sucking problem. First, the plau¬ 
sible move generator Is called to generate a list of variables 
which are likely to be 11 relevant" to the oral gratification 
goals, and causality test links (indicated by dotted connec¬ 
tions) are formed. 

Next, the causality pattern recognizer learns which test 
links represent actual causal relationships, The pattern it 
is looking for is: 

























If a pulse on variable A is followed by a pulse on vari¬ 
able B sufficiently often, A is assumed to cause l!. More pre¬ 
cis a 1 y, if Pr (b/a} - Pr(B|A or'"'A) > 0,25 after at least 15 
pulses an A and 15 on B have occurred, A is assumed to cause B. 
The pulses on A and B must be less than five simulated-time 
seconds apart* {If there are any pulses at all on B, then a 
pulse on A will always be "followed" by a pulse on E if we wait 
sufficiently long*} pr (bIa) is estimated by the coefficient 
learning program PftDLRN, discussed below, 

in some cases, it is sufficient to wait passively for a 
pulse on A* in other cases, the curiosity section of the per¬ 
formance program activates A as a goal in order to see if B 
follows {e*g, it activates "turn head left" to see if "mouth 
touch" follows}; this is the "play" or "exploratory behavior" 
mentioned above. The curiosity section attempts to test links 
which are new and have not been tested many times? links where 
the initial variable. A, is reasonably easy to obtain? and 
where the final variable, B, is "biologically useful " {if one 
may use the term to describe a computer program) in that ability 
to obtain B would contribute to the program's ability to obtain 
primary reward- Specifically, the curiosity section tests the 
link a->b which maximizes 

^ FH satfune{A, B) Need{B) 

GCtA) 
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where Satfunc (A* B) (saturation function) decreases linearly 
from 1 to 0 as the number o£ times when A, B has been tested 
increases from 0 to 15* Need(B) is an index of how much the 
ability of the program to obtain primary reward would be 
improved by improvements in its ability to obtain B. 

when the causality pattern recognizer detects that two 
variables, A and B, are causally related,, the corresponding 
code generator is called to compile the link A—in the goal 
tree* This code generator ±e a LIS? function called 
MftKEORGOAL (A, E), so named because it also handles the case 
where A is one of several logically OR'e-d goals* In LISP, the 
code generator turns out to be a straightforward and rather 
prosaic , if slightly long, program* separate sections are pro- 
vided for compiling the entries for WAKT, GPft, GC, and each 
variable associated with the curiosity system* Each section 
looks up the names of the variables Involved in the formula in 
question and substitutes them into the formula, using LISP’s 
symbol-- substituting ca pab i l ity * 

In the thumb- sucking problem, the program first learns the 
links; 
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although this version of the performance program will some¬ 
times succeed in obtaining N mouth touch," it doea not yet know 
which way to turn the (simulated) head. 

Next* the plausible move generator is called to provide a 
list of variables to be THEN 1 ed with the partially successful 
subgoals* Causality test links are compiled for the ordered-AND 
variables. Among them aret 
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Vl correlates very poorly with mouth touchr V2 correlates 
very well. Since Pr (mouth touch | V2) is very high, the per¬ 
formance program will activate this branch, rather than the 
other a, and the simulated infant will emit ‘'turn head le£t ,J in 
response to "left cheek touch." Similarly, it loarns to emit 
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11 turn head right' in response to '■ right cheek touch *'" 


Whit is happening here is that the conditional probability 


figures, such ag px (mouth touch 


turn head left} are being used 


as a hi 11-climbing criterion in program space (Minsky, 


(turn head left—£ mouth touch) works softie a f the time? rNSlMi 


forms new properties of the problem by combining properties 
which have proved useful in the past (Minsky,, ) „ 

Finally, "face touch" is identified, as a ''biologically 
useful 1 ' variable, and the program learns to activate "lift 
hand ' 1 ; when the (simulated) hand touches the face, the previously 


learned program takes over and completes the thumb-sucking 


operation* 

It is interesting to note the similarity between this 
learning sequence and Piaget 1 s observations on the learning of 
human infants* Although the real infant's learning is much 
more complicated, it followa the same gross sequence Of stagesp 
the real infant first learns to search from left to right with 
its head: then it learns which way to turn; then it learns to 
lift its hand, and suck its thumb. 

Uhe remainder of this chapter is devoted to a discussion of 
the PSIM interpreter and the FRQLRJT coefficient-learning pro¬ 
gram; it may be skipped on a first reading- 
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G» PSIM (para11el simulator) 

The PSIM interpreter,, embedded within LISP, handles the 
details of arranging the second-by-second occurrence of sinnu- 
luted events and relieves the compiler of the need tc schedule 
the sequence of computations, A PSIM program consists of a 
set of variables, each of which has an 5-expression which 
determines its value, E*g *: 

{Z (AND X {NOT Y) ) J 

(X (POISSON 0*13) 

(Y (POISSON 0*1)3 

The POISSON expressions generate Poiseon^dietributed pulse 
trains with mean frequency 0*1 pulses per second* Whenever a 
variable, such as X, changes the variables which depend on it 
are automatically updated, A graph of X, Y, and Z versus simu 
lated time will look something like this: 
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tSIM al^o handler th e complications which arise when the 

goal tre q is circular; in this case, an iteration procedure is 
used to calculate the GPR, GC r and want variables. 


H. FRBLRN (Probablity-Delay. Iearner! 


Conditional probabilities and time delays are estimated by 
a rather orthodox coefficient learning procedure (Minsky, }. 
Suppose there is a link between A and B* whenever A occurs t 
fallowed within five seconds by ft r Pr (E | A) is incremented 
by ah amount Q (1-old value of Pr (B | A)) t and Delay (A-'fB) is 
incremented by 0 (actual delay - old estimate of delay). If 
A occurs, but not B, Pr (B A) is decremented by an amount 
(old value Of Pr (E | A) ) and Delay (A -4B) is incremented by 
(5 seconds - old estimate of delay)♦ It can be shown that this 
procedure givea an unbiased estimate of Pr (b f A) and Delay 
(A^B), with an exponential weighting such that old occurrences 
of A affect the estimates less than now ones. 0 , the delay 
coefficient, is currently 0.1* 
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